Non-blocking minimum processes coordinated checkpointing for hierarchical computational grid
نویسندگان
چکیده
منابع مشابه
Hierarchical Coordinated Checkpointing Protocol
Coordinated checkpointing protocol is a simple and useful protocol, used for fault tolerance in distributed system on LAN. However, checkpoint overhead of the protocol is bottlenecked by the link speed. Checkpoint overhead of the protocol increases even if only one link in the network is of low-speed. In a metacomputing environment, where distributed application communicates over low speed WAN,...
متن کاملBlocking and non-blocking coordinated checkpointing for large scale MPI computation
Nowadays, clusters and grids are made of more and more computing nodes. The programming of multi-processes applications is the most often achieved through message passing. The increase of the number of processes implies that theses applications need to use a fault tolerant message passing library. In this paper, we present two implementations of fault tolerant protocols based on MPICH, a blocki...
متن کاملBlocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols
A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant programming environments should be used to guarantee the safe execution of critical applications. Research in fault tolerant MPIs has led to the development of several fault tolerant MPI environments. Different approaches a...
متن کاملMinimum Process Coordinated Checkpointing Scheme for Ad Hoc Networks
The wireless mobile ad hoc network (MANET) architecture is one consisting of a set of mobile hosts capable of communicating with each other without the assistance of base stations. This has made possible creating a mobile distributed computing environment and has also brought several new challenges in distributed protocol design. In this paper, we study a very fundamental problem, the fault tol...
متن کاملA Minimum-Process Coordinated Checkpointing Protocol For Mobile Distributed System
While dealing with Mobile Distributed systems, we come across some issues like: mobility, low bandwidth of wireless channels and lack of stable storage on mobile nodes, disconnections, limited battery power and high failure rate of mobile nodes. These issues make traditional checkpointing techniques designed for Distributed systems unsuitable for Mobile environments. In this paper, we design a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The International Conference on Electrical Engineering
سال: 2012
ISSN: 2636-4441
DOI: 10.21608/iceeng.2012.32714